Maximum Entropy Tiered Tagging

نویسنده

  • Alexandru Ceauşu
چکیده

Data sparseness in tagging highly inflectional languages with large tagsets and scarce training resources is a problem that cannot be addressed using only common tagging techniques. Tiered tagging is a two-stage technique that uses for tagging a smaller ”hidden” tagset and, in the second phase, recovers the original tagset using a lexicon and a set of hand-written rules. The recovering is possible only for the words contained in the lexicon. The paper describes an experiment that shows how the maximum entropy framework can be used for tiered tagging without a hand-written set of recovery rules and which works also for unknown words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NING MA et al: FUSION OF WORD CLUSTERING FEATURES FOR TIBETAN PART OF SPEECH TAGGING

Tibetan Part of Speech (POS) tagging, the foundation of Tibetan natural language processing, judges word classification according to contextual information of words. Based on the framework of the maximum entropy model, the paper studied the fusion of morphological features for Tibetan part of speech with maximum entropy model with the integration of word clustering features. Experimental result...

متن کامل

A Maximum Entropy Tagger with Unsupervised Hidden Markov Models

We describe a new tagging model where the states of a hidden Markov model (HMM) estimated by unsupervised learning are incorporated as the features in a maximum entropy model. Our method for exploiting unsupervised learning of a probabilistic model can reduce the cost of building taggers with no dictionary and a small annotated corpus. Experimental results on English POS tagging and Japanese wo...

متن کامل

Reduction of Maximum Entropy Models to Hidden Markov Models

Maximum Entropy (maxent) models are an attractive formalism for statistical models of many types and have been used for a number of purposes, including language modeling (Rosenfeld 1994), part of speech tagging (Ratnaparkhi 1996), prepositional phrase attachment (Ratnaparkhi 1998), sentence breaking (Reynar and Ratnaparkhi 1997) and parsing (Ratnaparkhi 1997). Maxent models allow the combinatio...

متن کامل

Probabilistic Part Of Speech Tagging for Bahasa Indonesia

In this paper we report our work in developing Part of Speech Tagging for Bahasa Indonesia using probabilistic approaches. We use Condtional Random Fields (CRF) and Maximum Entropy methods in assigning the tag to a word. We use two tagsets containing 37 and 25 part-of-speech tags for Bahasa Indonesia. In this work we compared both methods using using two different corpora. The results of the ex...

متن کامل

A Two-Stage Approach to Chinese Part-of-Speech Tagging

This paper describes a Chinese part-ofspeech tagging system based on the maximum entropy model. It presents a novel two-stage approach to using the part-ofspeech tags of the words on both sides of the current word in Chinese part-of-speech tagging. The system is evaluated on four corpora at the Fourth SIGHAN Bakeoff in the close track of the Chinese part-ofspeech tagging task.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006